Extracting relations from traditional Chinese medicine literature via heterogeneous entity networks
نویسندگان
چکیده
OBJECTIVE Traditional Chinese medicine (TCM) is a unique and complex medical system that has developed over thousands of years. This article studies the problem of automatically extracting meaningful relations of entities from TCM literature, for the purposes of assisting clinical treatment or poly-pharmacology research and promoting the understanding of TCM in Western countries. METHODS Instead of separately extracting each relation from a single sentence or document, we propose to collectively and globally extract multiple types of relations (eg, herb-syndrome, herb-disease, formula-syndrome, formula-disease, and syndrome-disease relations) from the entire corpus of TCM literature, from the perspective of network mining. In our analysis, we first constructed heterogeneous entity networks from the TCM literature, in which each edge is a candidate relation, then used a heterogeneous factor graph model (HFGM) to simultaneously infer the existence of all the edges. We also employed a semi-supervised learning algorithm estimate the model's parameters. RESULTS We performed our method to extract relations from a large dataset consisting of more than 100,000 TCM article abstracts. Our results show that the performance of the HFGM at extracting all types of relations from TCM literature was significantly better than a traditional support vector machine (SVM) classifier (increasing the average precision by 11.09%, the recall by 13.83%, and the F1-measure by 12.47% for different types of relations, compared with a traditional SVM classifier). CONCLUSION This study exploits the power of collective inference and proposes an HFGM based on heterogeneous entity networks, which significantly improved our ability to extract relations from TCM literature.
منابع مشابه
TCMGeneDIT: a database for associated traditional Chinese medicine, gene and disease information using text mining
BACKGROUND Traditional Chinese Medicine (TCM), a complementary and alternative medical system in Western countries, has been used to treat various diseases over thousands of years in East Asian countries. In recent years, many herbal medicines were found to exhibit a variety of effects through regulating a wide range of gene expressions or protein activities. As available TCM data continue to a...
متن کاملContext-aware Modeling for Spatio-temporal Data Transmitted from a Wireless Body Sensor Network
Context-aware systems must be interoperable and work across different platforms at any time and in any place. Context data collected from wireless body area networks (WBAN) may be heterogeneous and imperfect, which makes their design and implementation difficult. In this research, we introduce a model which takes the dynamic nature of a context-aware system into consideration. This model is con...
متن کاملGRAPH: A Domain Ontology-driven Semantic Graph Auto Extraction System
This paper presents sGRAPH – a domain ontology-driven semantic graph auto extraction system used to discover knowledge from text publications in traditional Chinese medicine. The traditional Chinese medicine language system (TCMLs), composed of an ontology schema and a knowledge base containing 153,692 words and 304,114 relations, is used as the domain ontology. The sGRAPH comprises two compone...
متن کاملAnti-inflammatory effect of Yu-Ping-Feng-San via TGF-β1 signaling suppression in rat model of COPD
Objective(s): Yu-Ping-Feng-San (YPFS) is a classical traditional Chinese medicine that is widely used for treatment of the diseases in respiratory systems, including chronic obstructive pulmonary disease (COPD) recognized as chronic inflammatory disease. However, the molecular mechanism remains unclear. Here we detected the factors involved in transforming growth factor beta 1 (TGF-β1)/Smad2 si...
متن کاملA Trainable Method For Extracting Chinese Entity Names And Their Relations
In this paper we propose a trainable method for extracting Chinese entity names and their relations. We view the entire problem as series of classification problems and employ memory-based learning (MBL) to resolve them. Preliminary results show that this method is efficient, flexible and promising to achieve better performance than other existing methods.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of the American Medical Informatics Association : JAMIA
دوره 23 2 شماره
صفحات -
تاریخ انتشار 2016